182

Bibliography

for Computational Linguistics and the 12th International Joint Conference on Natural

Language Processing, pages 102–108, 2022.

[41] De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng. Person

re-identification by multi-channel parts-based cnn with improved triplet loss function.

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,

pages 1335–1344, 2016.

[42] Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, and Daniel

Soudry. Neural gradients are near-lognormal: improved quantized and sparse training.

arXiv preprint arXiv:2006.08173, 2020.

[43] Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijay-

alakshmi Srinivasan, and Kailash Gopalakrishnan. Pact: Parameterized clipping ac-

tivation for quantized neural networks. arXiv preprint arXiv:1805.06085, 2018.

[44] Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul

Choo. Stargan: Unified generative adversarial networks for multi-domain image-to-

image translation. In Proceedings of the IEEE Conference on Computer Vision and

Pattern Recognition, pages 8789–8797, 2018.

[45] Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D Manning. What

does bert look at? An analysis of bert’s attention. arXiv preprint arXiv:1906.04341,

2019.

[46] Benoˆıt Colson, Patrice Marcotte, and Gilles Savard. An overview of bilevel optimiza-

tion. Annals of operations research, 153(1):235–256, 2007.

[47] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Training deep neural

networks with low precision multiplications. arXiv preprint arXiv:1412.7024, 2014.

[48] Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. Binaryconnect: Train-

ing deep neural networks with binary weights during propagations. Advances in neural

information processing systems, 28, 2015.

[49] Richard Crandall and Carl Pomerance. Prime numbers. Springer, 2001.

[50] M. Dehghani, S. Gouws, O. Vinyals, J. Uszkoreit, and L. Kaiser. Universal transform-

ers. In International Conference on Learning Representations, 2019.

[51] Mostafa Dehghani, Stephan Gouws, Oriol Vinyals, Jakob Uszkoreit, and Lukasz

Kaiser. Universal transformers. arXiv preprint arXiv:1807.03819, 2018.

[52] Alessio Del Bue, Joao Xavier, Lourdes Agapito, and Marco Paladini. Bilinear modeling

via augmented lagrange multipliers (balm). IEEE transactions on pattern analysis and

machine intelligence, 34(8):1496–1508, 2011.

[53] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A

large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference

on Computer Vision and Pattern Recognition, pages 248–255, 2009.

[54] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.

Bert: Pre-

training of deep bidirectional transformers for language understanding. arXiv preprint

arXiv:1810.04805, 2018.